-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-49764: bindata/alerts/slo: improve burnrate calculation #1744
OCPBUGS-49764: bindata/alerts/slo: improve burnrate calculation #1744
Conversation
/cc |
That makes sense to me, other burnrates ( |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Calculate the request burn rate based on the total number of read+write requests instead of separately calculating the burn rate for each request type. This used to cause an erroneous result when summing up the read and write burn rates together as it wouldn't account for the propertion of failures amongst all requests. Signed-off-by: Damien Grisonnet <[email protected]>
56d01d8
to
275f05d
Compare
/remove-lifecycle stale |
/retest-required |
@dgrisonnet: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgrisonnet, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@dgrisonnet: This pull request references Jira Issue OCPBUGS-49764, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira refresh |
@dgrisonnet: This pull request references Jira Issue OCPBUGS-49764, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/label acknowledge-critical-fixes-only |
/cherry-pick release-4.18 |
@dgrisonnet: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest-required |
f90c0d9
into
openshift:master
@dgrisonnet: Jira Issue OCPBUGS-49764: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-49764 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@dgrisonnet: new pull request created: #1797 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-kube-apiserver-operator |
The problem that I recently noticed with the existing expression is that when we compute the overall burnrate from write and read requests, we take the ratio of successful read requests and we sum it to the one of write requests. But both of these ratios are calculated against their relevant request type, not the total number of requests. This is only correct when the proportion of write and read requests is equal.
For example, let's imagine a scenario where 40% of requests are write requests and their success during a disruption is only 50%. Whilst for read requests we have 90% of success.
apiserver_request:burnrate1h{verb="write"} would be equal to
2/4
and apiserver_request:burnrate1h{verb="read"} would be1/6
.The sum of these as these by the alert today would be equal to
2/4+1/6=2/3
when in reality, the ratio of successful requests should be2/10*1/10=3/10
. So there is quite a huge difference today when we don't account for the total number of requests.